Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hybrid performance] softmax mask fuse upper triangle #33981

Merged

Conversation

FeixLiu
Copy link
Contributor

@FeixLiu FeixLiu commented Jul 6, 2021

PR types

New features

PR changes

OPs

Describe

Softmax mask fuse upper triangle.

With the observation that, for GPT kind structure, the attention mask is always be an upper triangle matrix that mask the upper triangle part of the QK product.

To save the time for creating mask and the HtoD time for the mask matrix (and may even save the time for communication of the mask between different stages of the PP), we fuse the softmax and mask (upper triangle) together.

Without this fusion:

# prepare QK and mask
QK_mask = QK + mask
rst = softmax(QK_mask)

With this fusion:

# prepare QK
rst = softmax_mask_fuse_upper_triangle(QK)

Performance gain (Static mode)

Model size AMP Hybird config (dp, pp, mp) Before fusion After fusion Gain
117M True 1(1, 1, 1) 32153 35493 +10.4%
117M False 1(1, 1, 1) 11304 12225 +8.1%
117M True 4(1, 1, 4) 73757 78592 +6.5%
117M False 4(1, 1, 4) 33674 35333 +4.9%
117M True 4(1, 4, 1) 43388 44514 +2.6%
117M False 4(1, 4, 1) 21790 22583 +3.6%
117M True 8(2, 2, 2) 58573 62136 +6.1%
117M False 8(2, 2, 2) 41666 43047 +3.3%

Precision check

b8d3b3cbc50d015caf72c617aea6c790

How to use

For dygraph:

import paddle.fluid as fluid
import paddle.incubate as incubate
import numpy as np
x_in_np = np.random.random((1, 1, 32, 32)).astype("float32")
input_x = fluid.dygraph.to_variable(x_in_np)
rst = incubate.softmax_mask_fuse_upper_triangle(input_x)

For static mode:

import paddle
import paddle.fluid as fluid
import paddle.incubate as incubate
import numpy as np
paddle.enable_static()
input_x = fluid.data(name="x", shape=[1, 1, 32, 32], dtype="float16")
rst = incubate.softmax_mask_fuse_upper_triangle(input_x)
x_in_np = np.random.random((1, 1, 32, 32)).astype("float16")
exe = fluid.Executor(fluid.CUDAPlace(0))
fetches = exe.run(fluid.default_main_program(),
                  feed={"x": x_in_np},
                  fetch_list=[rst])

@paddle-bot-old
Copy link

paddle-bot-old bot commented Jul 6, 2021

Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

ForFishes
ForFishes previously approved these changes Jul 8, 2021
Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@ForFishes ForFishes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@XiaoguangHu01 XiaoguangHu01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@PangHua PangHua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ForFishes ForFishes merged commit e2e1c57 into PaddlePaddle:develop Jul 12, 2021
@FeixLiu FeixLiu deleted the softmax_mask_fuse_upper_triangle branch July 13, 2021 05:41
@FeixLiu FeixLiu changed the title softmax mask fuse upper triangle [hybrid performance] softmax mask fuse upper triangle Oct 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants